feat(tts):增加tts（阿里云）提供商CosyVoice TTS(API)，Qwen TTS Realtime(API)的支持，增加过滤 TTS 文本中的内容功能 by yuxwd · Pull Request #7651 · AstrBotDevs/AstrBot

yuxwd · 2026-04-18T15:04:32Z

Modifications / 改动点

tts提供商添加

项目原生tts阿里云提供商tts支持不全面，改动添加了CosyVoice TTS(API)，Qwen TTS Realtime(API)的支持

过滤 TTS 文本中的内容

bot发送tts优化，增加了，过滤 TTS 文本中的内容的功能，可以让tts不读（）的内容，支持正则过滤

代码测试

进行了macos系统和linux（Alibaba Cloud Linux 3.2104 LTS 64位）测试无问题

This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

Checklist / 检查清单

😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能，已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试，并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
/ 我确保没有引入新依赖库，或者引入了新依赖库的同时将其添加到 requirements.txt 和 pyproject.toml 文件相应位置。
😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。

Summary by Sourcery

Add new Alibaba Cloud TTS providers and introduce configurable text filtering before TTS synthesis.

New Features:

Add Qwen TTS Realtime WebSocket streaming provider with configurable voice and prosody options.
Add CosyVoice TTS provider using DashScope with model, voice, and prosody configuration.
Introduce configurable TTS text filtering to strip bracketed/marked content and optional custom regex rules for both normal and live TTS modes.

Enhancements:

Wire new TTS providers and text filter settings into provider configuration metadata and dynamic provider loading so they are available in the dashboard UI and runtime.

Tests:

Add unit tests for the TTS text filtering utility and its filtered queue wrapper to validate built-in and custom rules and queue behavior.

Add two new TTS providers using Alibaba Cloud DashScope SDK: - Qwen TTS Realtime: WebSocket streaming TTS with low latency, supports qwen3-tts-flash-realtime and qwen3-tts-instruct-flash-realtime models - CosyVoice TTS: Non-streaming TTS with multiple voice options, supports cosyvoice-v3.5/v3/v2 models Includes config templates, provider manager integration, and i18n translations (zh-CN, en-US, ru-RU). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sourcery-ai

Sorry, we are unable to review this pull request

The GitHub API does not allow us to fetch diffs exceeding 300 files, and this pull request has 600

gemini-code-assist

Code Review

This pull request introduces a TTS text filtering mechanism to strip markers like brackets and asterisks from text before synthesis, and adds support for Qwen TTS Realtime and CosyVoice TTS providers. The review feedback points out several critical issues: a logic error in the Qwen streaming implementation that causes audio duplication, a blocking call in an asynchronous function that could impact responsiveness, and incorrect usage of the DashScope SDK in the CosyVoice provider. Additionally, the FilteredQueue implementation requires a call to the base class constructor to ensure all inherited methods function correctly.

yuxwd · 2026-04-18T15:39:45Z

服务器测试情况

…L 的警告日志 2.FilteredQueue 不再继承 asyncio.Queue，改用组合模式；自定义正则无效时输出警告日志

gemini-code-assist

Code Review

This pull request introduces a TTS text filtering mechanism to remove emotion markers and brackets before synthesis, alongside adding support for Qwen TTS Realtime and CosyVoice TTS providers. The changes include a new TTSTextFilter utility, a FilteredQueue wrapper for streaming scenarios, and corresponding configuration and localization updates. Feedback identifies a logic error in the Qwen provider causing redundant text synthesis, potential issues with empty strings in the filtered queue, and suggestions for optimizing imports and correcting type hints.

- 移除 Qwen TTS Realtime get_audio_stream 中 accumulated_text 重复发送逻辑，避免文本被合成两次 - FilteredQueue.get() 过滤后为空字符串时跳过并继续获取下一条目，防止 TTS 收到空文本 - 将 _pcm_to_wav 中的 import struct 移到文件顶部，避免流式路径中重复导入开销 - 修正 tts_text_queue 类型声明为 asyncio.Queue[str | None] | FilteredQueue

yuxwd · 2026-04-25T09:47:53Z

@gemini-code-assist review

gemini-code-assist

Code Review

This pull request introduces a TTS text filtering mechanism to remove emotion markers (like brackets) from text before synthesis, along with support for two new TTS providers: Qwen TTS Realtime and CosyVoice TTS via the DashScope API. The filtering logic includes a new utility class and an asynchronous queue wrapper, supported by configuration updates, dashboard translations, and unit tests. Feedback suggests optimizing configuration retrieval in the result decoration stage to avoid redundant lookups and adding type hints to the new queue wrapper for better maintainability.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

yuxwd · 2026-04-25T09:57:22Z

@gemini-code-assist review

gemini-code-assist

Code Review

This pull request introduces a TTS text filtering system to remove emotion markers and bracketed content from text before speech synthesis. It also adds support for two new TTS providers, Qwen TTS Realtime and CosyVoice TTS, along with their respective configurations and localization updates. Review feedback highlights an efficiency improvement by moving configuration retrieval outside of a loop and a critical fix in the FilteredQueue to ensure task_done() is called for filtered items to prevent potential deadlocks.

yuxwd · 2026-04-25T14:17:45Z

@sourcery-ai review

sourcery-ai

Hey - I've found 3 issues, and left some high level feedback:

In cosyvoice_tts_source.py, SpeechSynthesizer is imported unconditionally at module import time, unlike the guarded import pattern used in qwen_tts_realtime_source.py; consider wrapping this import in a try/except ImportError with a clear error message so environments without dashscope don’t crash on import.
The built-in patterns in TTSTextFilter.BUILTIN_PATTERNS aggressively strip all *...* and **...** segments, which may remove legitimate markdown emphasis rather than just emotion markers; consider either narrowing these regexes or making markdown-based stripping configurable to avoid unexpected loss of content.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `cosyvoice_tts_source.py`, `SpeechSynthesizer` is imported unconditionally at module import time, unlike the guarded import pattern used in `qwen_tts_realtime_source.py`; consider wrapping this import in a `try/except ImportError` with a clear error message so environments without `dashscope` don’t crash on import.
- The built-in patterns in `TTSTextFilter.BUILTIN_PATTERNS` aggressively strip all `*...*` and `**...**` segments, which may remove legitimate markdown emphasis rather than just emotion markers; consider either narrowing these regexes or making markdown-based stripping configurable to avoid unexpected loss of content.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/qwen_tts_realtime_source.py" line_range="145-154" />
<code_context>
+            f.write(audio_bytes)
+        return path
+
+    async def _synthesize(self, text: str) -> bytes | None:
+        """Use CosyVoice SpeechSynthesizer to synthesize speech."""
+        loop = asyncio.get_running_loop()
</code_context>
<issue_to_address>
**issue (bug_risk):** Close the WebSocket session and use the thread-safe drain helper when assembling audio in `_synthesize`.

`_synthesize` reads `callback.audio_chunks` directly while the WebSocket thread may still be appending, bypassing the `drain_audio_chunks()` lock, and it never explicitly closes `QwenTtsRealtime`.

To align with the streaming path and avoid races/resource leaks, you could do something like:

```python
try:
    await loop.run_in_executor(None, _connect_and_send)
    finished = callback.wait_for_finished(timeout=self.timeout)
    # ... error/timeout handling ...
    pcm_data = b"".join(callback.drain_audio_chunks())
    if not pcm_data:
        return None
    return self._pcm_to_wav(pcm_data, sample_rate=24000)
finally:
    try:
        await loop.run_in_executor(None, qwen_tts.close)
    except Exception:
        pass
```

This keeps audio access thread-safe and ensures the WebSocket is closed even on error paths.
</issue_to_address>

### Comment 2
<location path="astrbot/core/provider/sources/cosyvoice_tts_source.py" line_range="16" />
<code_context>
+import os
+import uuid
+
+from dashscope.audio.tts_v2 import AudioFormat, SpeechSynthesizer
+
+from astrbot.core import logger
</code_context>
<issue_to_address>
**issue (bug_risk):** Guard the CosyVoice DashScope import similarly to the Qwen provider to avoid hard ImportError.

This provider imports `SpeechSynthesizer` at module import time without a `try/except`, so if the DashScope TTS v2 API or `audio.tts_v2` isn’t available, importing this module will raise and can break provider loading.

To align with Qwen and fail gracefully, wrap the import in a `try/except ImportError`, set `AudioFormat` and `SpeechSynthesizer` to `None` on failure, and then have `_synthesize` / `get_audio` raise a clear `RuntimeError` when the SDK capability is missing.
</issue_to_address>

### Comment 3
<location path="astrbot/core/utils/tts_text_filter.py" line_range="52" />
<code_context>
+        return result.strip()
+
+
+class FilteredQueue:
+    """异步队列包装器，在 get() 时自动过滤文本。
+
</code_context>
<issue_to_address>
**issue (complexity):** Consider replacing the speculative FilteredQueue wrapper with a simple filtered_get coroutine to keep behavior explicit and avoid a partial queue-like abstraction.

You can keep the text‑filter functionality while avoiding the speculative `FilteredQueue` abstraction and its partial queue façade.

Instead of wrapping `asyncio.Queue` in a class that shadows part of its API and hides the drop behavior inside `get()`, extract a focused helper coroutine that can be used wherever streaming is actually needed:

```python
class TTSTextFilter:
    ...
    @classmethod
    def apply(cls, text: str, custom_rules: list[str] | None = None) -> str:
        ...
        return result.strip()
```

Replace `FilteredQueue` with:

```python
async def filtered_get(
    queue: asyncio.Queue,
    custom_rules: list[str] | None = None,
) -> str | None:
    """从队列中获取一个元素；如果是字符串则应用 TTS 过滤。

    返回值：
    - None: 队列中取出了 None，呼叫方可认为是结束信号。
    - 非空字符串: 过滤后的文本。
    - 其他类型: 原样返回（例如用于控制消息）。
    """
    while True:
        item = await queue.get()
        if item is None:
            return None
        if isinstance(item, str):
            filtered = TTSTextFilter.apply(item, custom_rules)
            if filtered:
                return filtered
            # 空字符串时继续从队列中取下一个
            continue
        return item
```

Usage in a future streaming TTS consumer stays explicit and predictable:

```python
while True:
    text = await filtered_get(raw_text_queue, custom_rules)
    if text is None:
        break
    # 使用过滤后的 text 进行 TTS
```

This keeps:

- All current filtering behavior.
- No partial queue interface to maintain.
- No surprising `get()` semantics on a “queue‑like” object; the drop/loop behavior is clear at the call site.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-04-25T14:19:32Z

+    async def _synthesize(self, text: str) -> bytes | None:
+        """Use Qwen TTS Realtime WebSocket API to synthesize speech."""
+        if QwenTtsRealtime is None:
+            raise RuntimeError(
+                "dashscope SDK missing QwenTtsRealtime. "
+                "Please upgrade the dashscope package to use Qwen TTS Realtime.",
+            )
+
+        callback = _QwenRealtimeCallback()
+        model = self.get_model()


issue (bug_risk): Close the WebSocket session and use the thread-safe drain helper when assembling audio in _synthesize.

_synthesize reads callback.audio_chunks directly while the WebSocket thread may still be appending, bypassing the drain_audio_chunks() lock, and it never explicitly closes QwenTtsRealtime.

To align with the streaming path and avoid races/resource leaks, you could do something like:

try: await loop.run_in_executor(None, _connect_and_send) finished = callback.wait_for_finished(timeout=self.timeout) # ... error/timeout handling ... pcm_data = b"".join(callback.drain_audio_chunks()) if not pcm_data: return None return self._pcm_to_wav(pcm_data, sample_rate=24000) finally: try: await loop.run_in_executor(None, qwen_tts.close) except Exception: pass

This keeps audio access thread-safe and ensures the WebSocket is closed even on error paths.

sourcery-ai · 2026-04-25T14:19:32Z

+import os
+import uuid
+
+from dashscope.audio.tts_v2 import AudioFormat, SpeechSynthesizer


issue (bug_risk): Guard the CosyVoice DashScope import similarly to the Qwen provider to avoid hard ImportError.

This provider imports SpeechSynthesizer at module import time without a try/except, so if the DashScope TTS v2 API or audio.tts_v2 isn’t available, importing this module will raise and can break provider loading.

To align with Qwen and fail gracefully, wrap the import in a try/except ImportError, set AudioFormat and SpeechSynthesizer to None on failure, and then have _synthesize / get_audio raise a clear RuntimeError when the SDK capability is missing.

sourcery-ai · 2026-04-25T14:19:32Z

+        return result.strip()
+
+
+class FilteredQueue:


issue (complexity): Consider replacing the speculative FilteredQueue wrapper with a simple filtered_get coroutine to keep behavior explicit and avoid a partial queue-like abstraction.

You can keep the text‑filter functionality while avoiding the speculative FilteredQueue abstraction and its partial queue façade.

Instead of wrapping asyncio.Queue in a class that shadows part of its API and hides the drop behavior inside get(), extract a focused helper coroutine that can be used wherever streaming is actually needed:

class TTSTextFilter: ... @classmethod def apply(cls, text: str, custom_rules: list[str] | None = None) -> str: ... return result.strip()

Replace FilteredQueue with:

async def filtered_get( queue: asyncio.Queue, custom_rules: list[str] | None = None, ) -> str | None: """从队列中获取一个元素；如果是字符串则应用 TTS 过滤。返回值： - None: 队列中取出了 None，呼叫方可认为是结束信号。 - 非空字符串: 过滤后的文本。 - 其他类型: 原样返回（例如用于控制消息）。 """ while True: item = await queue.get() if item is None: return None if isinstance(item, str): filtered = TTSTextFilter.apply(item, custom_rules) if filtered: return filtered # 空字符串时继续从队列中取下一个 continue return item

Usage in a future streaming TTS consumer stays explicit and predictable:

while True: text = await filtered_get(raw_text_queue, custom_rules) if text is None: break # 使用过滤后的 text 进行 TTS

This keeps:

All current filtering behavior.

No partial queue interface to maintain.

No surprising get() semantics on a “queue‑like” object; the drop/loop behavior is clear at the call site.

yuxwd and others added 6 commits April 18, 2026 22:08

feat::增加tts请求去除**（）等符号中包含内容的功能

09c71b2

feat::增加tts请求去除**（）等符号中包含内容的功能

d76715d

feat::增加tts请求去除**（）等符号中包含内容的功能,在三个语言文件中添加了翻译

c030c3b

fix：更新前端构建文件

630517e

fix：修复astrbot/core/astr_agent_run_util.py中text_queue变量

1f6df28

auto-assign Bot requested review from Raven95676 and Soulter April 18, 2026 15:04

sourcery-ai Bot reviewed Apr 18, 2026

View reviewed changes

dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Apr 18, 2026

gemini-code-assist Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread astrbot/core/provider/sources/qwen_tts_realtime_source.py Outdated

Comment thread astrbot/core/provider/sources/qwen_tts_realtime_source.py

Comment thread astrbot/core/provider/sources/cosyvoice_tts_source.py

Comment thread astrbot/core/utils/tts_text_filter.py

yuxwd closed this Apr 18, 2026

fix: 修复 astr_agent_run_util.py 中 text_queue 变量名不一致问题，统一为 tts_text_queue

4894177

yuxwd reopened this Apr 18, 2026

Soulter force-pushed the master branch 2 times, most recently from faf411f to 0068960 Compare April 19, 2026 09:50

yuxwd and others added 3 commits April 25, 2026 12:32

fix：撤销「fix：更新前端构建文件」

75a6757

fix:1. dashscope.api_key 设置，改为通过 QwenTtsRealtime 构造参数传入；添加非 wss:// UR…

e4abb39

…L 的警告日志 2.FilteredQueue 不再继承 asyncio.Queue，改用组合模式；自定义正则无效时输出警告日志

Merge branch 'master' into master

cabc8a7

gemini-code-assist Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread astrbot/core/provider/sources/qwen_tts_realtime_source.py Outdated

Comment thread astrbot/core/utils/tts_text_filter.py Outdated

Comment thread astrbot/core/provider/sources/qwen_tts_realtime_source.py Outdated

Comment thread astrbot/core/astr_agent_run_util.py Outdated

yuxwd and others added 2 commits April 25, 2026 17:20

Merge branch 'AstrBotDevs:master' into master

310f1c9

gemini-code-assist Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread astrbot/core/pipeline/result_decorate/stage.py

Comment thread astrbot/core/utils/tts_text_filter.py Outdated

Update astrbot/core/utils/tts_text_filter.py

0517df4

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gemini-code-assist Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread astrbot/core/pipeline/result_decorate/stage.py

Comment thread astrbot/core/utils/tts_text_filter.py

sourcery-ai Bot reviewed Apr 25, 2026

View reviewed changes

yuxwd mentioned this pull request Apr 25, 2026

[Bug]dashscope_tts 使用 qwen3-tts-vc-realtime-2026-01-15 测试失败，报错 Invalid message type #6045

Open

yuxwd closed this Apr 25, 2026

yuxwd reopened this Apr 25, 2026

yuxwd closed this Apr 26, 2026

Uh oh!

Conversation

yuxwd commented Apr 18, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Modifications / 改动点

tts提供商添加

过滤 TTS 文本中的内容

代码测试

Screenshots or Test Results / 运行截图或测试结果

Checklist / 检查清单

Summary by Sourcery

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuxwd commented Apr 18, 2026

服务器测试情况

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuxwd commented Apr 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

yuxwd commented Apr 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

yuxwd commented Apr 25, 2026

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yuxwd commented Apr 18, 2026 •

edited by sourcery-ai Bot

Loading